Context-specific independence mixture modeling for positional weight matrices
نویسندگان
چکیده
MOTIVATION A positional weight matrix (PWM) is a statistical representation of the binding pattern of a transcription factor estimated from known binding site sequences. Previous studies showed that for factors which bind to divergent binding sites, mixtures of multiple PWMs increase performance. However, estimating a conventional mixture distribution for each position will in many cases cause overfitting. RESULTS We propose a context-specific independence (CSI) mixture model and a learning algorithm based on a Bayesian approach. The CSI model adjusts complexity to fit the amount of variation observed on the sequence level in each position of a site. This not only yields a more parsimonious description of binding patterns, which improves parameter estimates, it also increases robustness as the model automatically adapts the number of components to fit the data. Evaluation of the CSI model on simulated data showed favorable results compared to conventional mixtures. We demonstrate its adaptive properties in a classical model selection setup. The increased parsimony of the CSI model was shown for the transcription factor Leu3 where two binding-energy subgroups were distinguished equally well as with a conventional mixture but requiring 30% less parameters. Analysis of the human-mouse conservation of predicted binding sites of 64 JASPAR TFs showed that CSI was as good or better than a conventional mixture for 89% of the TFs and for 70% for a single PWM model. AVAILABILITY http://algorithmics.molgen.mpg.de/mixture.
منابع مشابه
Models of Random Sparse Eigenmatrices with Application to Bayesian Factor Analysis
We discuss a new class of models for random covariance structures defined by probability distributions over sparse eigenmatrices. The decomposition of orthogonal square matrices in terms of Givens rotations defines a natural, interpretable framework for defining prior distributions over the sparsity structure of random eigenmatrices. We explore some theoretical aspects and implications for cond...
متن کاملEnhanced position weight matrices using mixture models
MOTIVATION Positional weight matrix (PWM) is derived from a set of experimentally determined binding sites. Here we explore whether there exist subclasses of binding sites and if the mixture of these subclass-PWMs can improve the binding site prediction. Intuitively, the subclasses correspond to either distinct binding preference of the same transcription factor in different contexts or distinc...
متن کاملContext-Specific Independence Mixture Modelling for Protein Families
Protein families can be divided into subgroups with functional differences. The analysis of these subgroups and the determination of which residues convey substrate specificity is a central question in the study of these families. We present a clustering procedure using the context-specific independence mixture framework using a Dirichlet mixture prior for simultaneous inference of subgroups an...
متن کاملModeling long temporal contexts for robust DNN-based speech recognition
Deep Neural Networks (DNNs) have been shown to outperform traditional Gaussian Mixture Models in many Automatic Speech Recognition tasks. In this work, we investigate the potential of modeling long temporal acoustic contexts using DNNs. The complete temporal context is split into several subcontexts. Multiple sub-context DNNs initialized with the same set of Restricted Boltzmann Machines are fi...
متن کاملPROMO: detection of known transcription regulatory elements using species-tailored searches
We have developed a set of tools to construct positional weight matrices from known transcription factor binding sites in a species or taxon-specific manner, and to search for matches in DNA sequences.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 22 14 شماره
صفحات -
تاریخ انتشار 2006